Source-Language Dictionaries Help Non-Expert Users to Enlarge Target-Language Dictionaries for Machine Translation

نویسندگان

Víctor M. Sánchez-Cartagena

Miquel Esplà-Gomis

Juan Antonio Pérez-Ortiz

چکیده

In this paper, a previous work on the enlargement of monolingual dictionaries of rule-based machine translation systems by non-expert users is extended to tackle the complete task of adding both source-language and target-language words to the monolingual dictionaries and the bilingual dictionary. In the original method, users validate whether some suffix variations of the word to be inserted are correct in order to find the most appropriate inflection paradigm. This method is now improved by taking advantage from the strong correlation detected between paradigms in both languages to reduce the search space of the target-language paradigm once the source-language paradigm is known. Results show that, when the source-language word has already been inserted, the system is able to more accurately predict which is the right target-language paradigm, and the number of queries posed to users is significantly reduced. Experiments also show that, when the source language and the target language are not closely related, it is only the source-language part-of-speech category, but not the rest of information provided by the source-language paradigm, which helps to correctly classify the target-language word.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimodal Building of Monolingual Dictionaries for Machine Translation by Non-Expert Users

This paper explores a new approach to help non-expert users with no background in linguistics to add new words to a monolingual dictionary in a rule-based machine translation system. Our method aims at obtaining the correct paradigm which explains not only the particular surface form introduced by the user, but also the rest of inflected forms of the word. An initial set of potential paradigms ...

متن کامل

Comparison of SYSTRAN and Google Translate for English→ Portuguese

Two machine translation (MT) systems, a statistical MT (SMT) system and a hybrid system (rule-based and SMT) were tested in order to compare various MT performances. The source language was English (EN) and the target language Portuguese (PT). The SMT tool gave much fewer errors than the hybrid system. Major problem areas of both systems concerned the transfer of verb systems from source to tar...

متن کامل

Towards a Thesaurus of Predicates

We propose a thesaurus of predicates that can help to resolve pre-editing and/or post-editing problems in machine translation environments. It differs from earlier approaches such as conventional dictionaries in that we are aiming to link a wide range of near-synonyms and paraphrases. We are compiling such similar examples through both introspection and the use of translation data, giving us a ...

متن کامل

Generation of Bilingual Dictionaries using Comparable and Quasi Comparable Corpora

The amount of information available on the web is increasing rapidly. The number of internet users is also increasing every day. A significant section of internet users is monolingual. They want to express themselves in their native language and also seeking information in the same. Hence, multilingual content over the internet is also increasing at a rapid pace. There is a need of systems whic...

متن کامل

Papillon Lexical Database Project: Monolingual Dictionaries & Interlingual Links

This paper presents a new research and development project called Papillon. It started as a French-Japanese cooperation between laboratories GETA/CLIPS (Grenoble, France) and NII (Tokyo, Japan). Its goal is to build a multilingual lexical database and to extract from it digital bilingual dictionaries. The database is built with monolingual dictionaries, one for each language of the database, li...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Source-Language Dictionaries Help Non-Expert Users to Enlarge Target-Language Dictionaries for Machine Translation

نویسندگان

چکیده

منابع مشابه

Multimodal Building of Monolingual Dictionaries for Machine Translation by Non-Expert Users

Comparison of SYSTRAN and Google Translate for English→ Portuguese

Towards a Thesaurus of Predicates

Generation of Bilingual Dictionaries using Comparable and Quasi Comparable Corpora

Papillon Lexical Database Project: Monolingual Dictionaries & Interlingual Links

عنوان ژورنال:

اشتراک گذاری